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Abstract 

Background: Modeling biological networks serves as both a nnajor goal and an effective tool of systems biology in 
studying nnechanisms that orchestrate the activities of gene products in cells. Biological networks are context-specific 
and dynamic in nature. To systematically characterize the selectively activated regulatory components and mechanisms, 
modeling tools must be able to effectively distinguish significant rewiring from random background fluctuations. 
While differential networks cannot be constructed by existing knowledge alone, novel incorporation of prior 
knowledge into data-driven approaches can improve the robustness and biological relevance of network 
inference. However, the major unresolved roadblocks include: big solution space but a small sample size; highly 
complex networks; imperfect prior knowledge; missing significance assessment; and heuristic structural parameter 
learning. 

Results: To address these challenges, we formulated the inference of differential dependency networks that 
incorporate both conditional data and prior knowledge as a convex optimization problem, and developed an 
efficient learning algorithm to jointly infer the conserved biological network and the significant rewiring across 
different conditions. We used a novel sampling scheme to estimate the expected error rate due to "random" 
knowledge. Based on that scheme, we developed a strategy that fully exploits the benefit of this data-knowledge 
integrated approach. We demonstrated and validated the principle and performance of our method using 
synthetic datasets. We then applied our method to yeast cell line and breast cancer microarray data and obtained 
biologically plausible results. The open-source R software package and the experimental data are freely available at 
http://www.cbil.ece.vt.edu/software.htm. 

Conclusions: Experiments on both synthetic and real data demonstrate the effectiveness of the knowledge-fused 
differential dependency network in revealing the statistically significant rewiring in biological networks. The 
method efficiently leverages data-driven evidence and existing biological knowledge while remaining robust to 
the false positive edges in the prior knowledge. The identified network rewiring events are supported by previous 
studies in the literature and also provide new mechanistic insight into the biological systems. We expect the 
knowledge-fused differential dependency network analysis, together with the open-source R package, to be an 
important and useful bioinformatics tool in biological network analyses. 
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Background 

Biological networks are context-specific and dynamic in 
nature [1]. Under different conditions, different regulatory 
components and mechanisms are selectively activated or 
deactivated [2,3]. One example is the topology of under- 
lying biological network changes in response to internal or 
external stimuli, where cellular components exert their 
functions through interactions with other molecular com- 
ponents [4,5]. Thus, in addition to asking "which genes are 
differentially expressed", the new question is "which genes 
are differentially connected" [6,7]. Studies on network- 
altering events will shed new light on whether network re- 
wiring is a general principle of biological systems regarding 
disease progression or therapeutic responses [2,3]. More- 
over, due to inevitable experimental noise, snapshots of 
dynamic expression, and post-transcriptional or transla- 
tional/post-translational modifications, systematic efforts 
to characterize biological networks must effectively distin- 
guish significant network rewiring from random back- 
ground fluctuations [1]. 

Almost exclusively using high-throughput gene ex- 
pression data and focusing on conserved biological net- 
works, various network inference approaches have been 
proposed and tested [1], including probabilistic Boolean 
networks [8], state-space models [9,10], and probabilis- 
tic graphical models [11]. However, since these methods 
often assume that there is a static network structure, 
they overlook the inherently dynamic nature of molecu- 
lar interactions, which can be extensively rewired across 
different conditions. Hence, current network models 
only present a conserved cellular network averaging 
across all samples. To explicitly address differential net- 
work analysis [3,5,12], some initial efforts have been re- 
cently reported [1]. In our previous work, Zhang et al. 
proposed to model differential dependency networks 
between two conditions by detecting network rewiring 
using significance tests on local dependencies across 
conditions [13,14], which is a substantially different 
method from the one proposed in this paper where ex- 
perimental data and prior knowledge are jointly mod- 
eled. The approach was successfully extended by Roy 
et al. to learn dynamic networks across multiple condi- 
tions [15], and by Gill et al. to assess the overall evi- 
dence of network differences between two conditions 
using the connectivity scores associated with a gene or 
module [16]. Pioneered and reported in [17], correlation 
and partial correlation are used to construct network 
graphs, and differential pathway analysis is developed 
based on graph edit distance. The temporal evolution of 
network structures is examined with a fused penalty 
term to encode relationship between adjacent time 
points in [18]. Furthermore, recent efforts have also 
been made to incorporate existing knowledge about net- 
work biology into data-driven network inference [19]. 



Wang et al. proposed to incorporate prior knowledge 
into the inference of conserved networks in a single 
condition by adjusting the Lasso penalties [20]. Yet, the 
inherently dynamic wiring of biological networks re- 
mains under-explored at the systems level, as inter- 
action data are typically reported under diverse and 
isolated conditions [1]. 

There are at least five unresolved issues concerning 
differential network inference using data-knowledge in- 
tegrated approaches: (1) the solution (search) space is 
usually large while sample sizes are small, resulting in 
potential overfitting; (2) both conserved and differential 
biological networks are complex and lack closed-form or 
efficient numerical solutions; (3) "structural" model pa- 
rameters are assigned heuristically, leading to potentially 
suboptimal solutions; (4) prior knowledge is imperfect 
for inferring biological networks under specific condi- 
tions, e.g., false positive "connections", biases, and non- 
specificity; and (5) most current methods do not provide 
significance assessment on the differential connections 
and rigorous testing of the type I error rate. 

To address these challenges, we formulated the infer- 
ence of differential dependency networks that incorpor- 
ate both conditional data and prior knowledge as a 
convex optimization problem, and developed an effi- 
cient learning algorithm to jointly infer the conserved 
biological network and the significant rewiring across 
different conditions. Extending and improving our work 
on Gaussian graphical models [21,22], we designed 
block-wise separable penalties in the Lasso-type models 
that permit joint learning and knowledge incorporation 
with an efficient closed-form solution. We estimated the 
expected error rate due to "random" prior knowledge 
via a novel sampling scheme. Based on that scheme, we 
developed a strategy to fully exploit the benefit of this 
data-knowledge integrated approach. We determined 
the values of model parameters that quantitatively cor- 
respond to the expected significance level, and evalu- 
ated the statistical significance of each of the detected 
differential connections. We validated our method using 
synthetic datasets and comprehensive comparisons. We 
then applied our method to yeast cell line and breast 
cancer microarray data and obtained biologically plaus- 
ible results. 

Methods 

Formulation of knowledge-fused differential dependency 
network (kDDN) 

We represent the condition-specific biological networks 
as graphs. Suppose there are p nodes (genes) in the net- 
work of interest, and we denote the vertex set as V. Let 
and G^^^ = {V,E^^^) be the two undirected 
graphs under the two conditions. G^^^ and G^^^ have the 
same vertex set V, and condition-specific edge sets E^^^ 
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and E^^\ The edge changes indicated by the differences 
between E^^^ and E^^^ are of particular interest, since 
such rewiring may reveal pivotal information on how 
the organisms respond to different conditions. We label 
the edges as common edges or specific to a particular 
condition in graph G={V,E) to represent the learned 
networks under the two conditions. 

Prior knowledge on biological networks is obtained 
from well-established databases such as KEGG [19] and 
is represented as a knowledge graph G^^={V,E^^), 
where the vertex set V is the same set of nodes (genes) 
and the edge set £w over V is translated from prior 
knowledge. There are many alternatives to extract exist- 
ing domain knowledge, e.g., STRING, HPRD, or manual 
construction. The adjacency matrix of Gy^r, Wg^^""^, 
is used to encode the prior knowledge. The elements of 
W are either 1 or 0, with Xji = 1 indicating the existence 
of an edge from the gene to the gene (or their gene 
products), where = 1, 2, • / ^ W is symmetric if 
the prior knowledge is not directed. 

The main task in this paper is to infer from data 
and prior knowledge Gw the condition-specific edge 
sets E (both E^^^ and E^^^), The method is illustrated 
in Figure 1. 

We consider the p nodes in F as random variables, 
and denote them as Xi,X2, "'>Xp, Suppose there are A^i 
samples under condition 1 and A/2 samples under condi- 
tion 2. Without loss of generality, we assume Ni=N2 = N. 
Under the first condition, for variable X/, we have observa- 
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with the non-zero elements of p|^^ indicating the neigh- 
bours of the node under the first condition and the 

(2) 

non-zero elements of p- ' indicating the neighbours of 
the node under the second condition. 

The problem of simultaneously learning network 
structures and their changes under two conditions is for- 
mulated as a regularized linear regression problem with 
sparse constraints and solved by convex optimization. 
For each node (variable) X/, i = l,2,- -,j?, we solve the 
optimization with the objective function 



(1) r(2) 



(3) 



The non-zero elements in W introduce knowledge to 
the objective function (3), and ^ is a ^1 penalty relax- 
ation parameter taking value in [0, 1]. 

The solution is obtained by minimizing (3), 



Conditioi^l 



N^Gene 




Figure 1 Knowledge-fused differential dependency network learning. The algorithm takes condition-specific data and prior l<nowledge as 
input and infers condition-specific networl<s. Blacl< edges are common edges. Red and green edges are differential edges specific to conditions. 
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= argminp/(p,.) 

= argminpa)^p,) ^ ||y,-Xp,||^+Ai^(l-\^,,^) 



s.t.^W 



0. 



Both the cost function 



(4) 



and two 



regularization terms "^{l-WjiO) ( ^f\ + \^f ) and 

P-^^-pf^ co-existed in the objective function are con- 
vex, and this convex formulation leads to an efficient al- 
gorithm. The structures of the graphical model under 
two conditions are obtained jointly by solving (4) se- 
quentially for all nodes. The inconsistency between p|^^ 

and p|^^ highlights the structural changes between two 
conditions, and the collection of differential edges form 
the differential dependency network. 

Given the vast search space and complexity in both con- 
served and differential networks, it is crucial for kDDN to 
identify statistically significant network changes and filter 
the structural and parametric inconsistencies due to noise 
in the data and limited samples. This objective is achieved 
by selecting the proper model specified by Ai and A2 that 
best fits the data and suffices the statistical significance. Ai 
is determined by controlling the probability of falsely join- 
ing two distinct connectivity components of the graph 
[23] and A2 is found by setting differential edges to a de- 
fined significance level. We refer readers to Additional 
file 1: S4.1 for a detailed discussion of model parameter- 
setting approaches. 

With parameters specified, problem (4) can be solved ef- 
ficiently by the block coordinate descent algorithm pre- 
sented in Additional file 1: S4.3, Algorithm SI. 

Incorporation of prior knowledge 

The prior knowledge is explicitly incorporated into the 
formulation by Wji and 0 in the block-wise weighted ti 
-regularization term. Wji = 1 indicates that the prior 
knowledge supports an edge from the gene to the 
gene and 0 otherwise. A proper 6 will reduce the penalty 

applied to , c = 1, 2, corresponding to the connection 
between Xj and Xi with Wji = 1. As a result, the connec- 
tion between Xj and Xi will more likely be detected. 

^ is a weighting parameter on the influence of prior 
knowledge, determining the degree of the knowledge in- 
corporation in the network inference. When 6 = 0, the 
algorithm ignores all knowledge information and gives 
solely data-based results; conversely, when 6=1, the 



edge between Xj and Xi will always be included if such 
an edge exists in the prior knowledge. Therefore, the prior 
knowledge incorporation needs to find a proper balance 
between the experimental data and prior knowledge to 
achieve effective incorporation, as well as limit the adverse 
effects caused by any spurious edges contained in imper- 
fect prior knowledge. 

Here we propose a strategy to control the adverse ef- 
fects incurred in the worst-case scenario under which 
the given prior knowledge is totally random. In this case, 
the entropy of the knowledge distribution over the edges 
is maximized and the information introduced to the 
inference is minimal. Incorporating such random know- 
ledge, the inference results will deviate from the purely 
data-driven result. We want to maximize the incorpor- 
ation of relevant prior knowledge, while at the same 
time making sure the potential negative influence of ir- 
relevant prior knowledge is under control so that the ex- 
pected deviation is confined within an acceptable range 
in the worst-case scenario. To properly set the value of 
6, we assess the actual influence of prior knowledge for 
each value that 6 may take, and developed Theorem 1 to 
determine the best degree of prior knowledge incorpor- 
ation. This approach guarantees robustness even when the 
prior knowledge is highly inconsistent with the underlying 
ground truth. 

To quantify the effects of prior knowledge incorporation, 
we use graph edit distance [24] between two adjacency 
matrices as a measurement for the dissimilarity of two 
graphs. Let Gt = (K^t) denote the ground-truth graph 
with edge set Et, Gx = (K^x) denote the graph learned 
purely from data, i.e., W = 0, and Gx,Wr,6' = (^7^x,Wr,0) 
denote the graph learned with prior knowledge. Wr indi- 
cates that the prior knowledge is "random". Let d{Gi, G2) 
denote the graph edit distance between two graphs. Fur- 
ther, let |£| be the number of edges in the graph G. 

Our objective is to bound the increase of inference 
error rate associated with the purely data-driven result, 
d{Gj,Gxyir^,d)/\ET\-d{Gj,Gx)/\ET\> within an accept- 
able range S even if the prior knowledge is the worst 
case by finding a proper 6, 

Since Gt is unknown, we instead control the increase in 
the error rate indirectly by evaluating the effect of random 
knowledge against Gx, the purely data-driven inference 
result. Specifically, we use a sampling-based algorithm to 
find the empirical distribution of <i(Gx, Gx,Wr,6') > and 
choose the largest 6e [0, 1] that satisfies: 



6 = max6 

s.t. E[d{Gx,Gx,w,,e)]/\Ex\<S, 



(5) 



where E[d{Gi, G2)] is the expectation of the graph edit 
distance between graphs Gi and G2, with respect to its 
empirical distribution. 
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A natural question is whether using Gx instead of Gt 
to control the increase in the error rate induced by ran- 
dom knowledge is legitimate. To answer this question, 
we show in Theorem 1 (proof included in Additional 
file 1: S2) that the 6 obtained in (5) in fact controls an 
upper bound of E[d(^Gj^ G^x,Wr,6')] /|^t|> i-e. the increase 
in the network inference error rate induced by random 
prior knowledge (the worst-case scenario), under the as- 
sumption that the number of false negatives (FN) in the 
data-driven result Gx is smaller than the number of 
false positives (FP), As we adopt a strategy to control 
the probability of falsely joining two distinct connectiv- 
ity components [23], this assumption generally holds. 

Theorem 1 establishes the relationship between prior 
knowledge incorporation 6 and the adverse effects of 
prior knowledge on network inference, quantified by 
under the worst-case scenario (when the prior know- 
ledge is completely irrelevant). For example, 5 = 0.1 indi- 
cates that the user can accept at most 10% performance 
degradation if the prior knowledge is completely noise. 
With the estimate of 6 at ^ = 0.1, even the prior know- 
ledge is totally random, the performance will decrease 
no more than 10%, while the relevant portion of the real 
prior knowledge (better than random noise) can greatly 
improve the overall network inference performance. 

Theorem 1 

For a given 6e [0, 1), if the prior knowledge incorporation 
parameter 6 satisfies the inequality 

£[<i(Gx,Gx,WR,0)] 
l^xl 

then the increase in the error rate induced by incorpora- 
ting random prior knowledge is bounded by 8, more 
specifically, 

£[<i(GT,Gx,WR,0)] ^ ^(Gt,Gx) . . 

|£t| " I^tI ^ ^ 

Given the number of edges specified in the prior 
knowledge, procedures to compute 6 are detailed in 
Algorithm S2 in Additional file 1: S4.3. 

Results and discussion 

We demonstrated the utility of kDDN using both simu- 
lation data and real biological data. In the simulation 
study, we tested our method on networks with different 
sizes and compared with peer methods the performance 
of overall network structure recovery, differential net- 
work identification and tolerance of false positives in the 
prior knowledge. 

In a real data application, we used kDDN to learn the 
network rewiring of the cell cycle pathway of budding 
yeast in response to oxidative stress. A second real data 



application was the study of the differential apoptotic 
signaling between recurring and non-recurring breast 
cancer tumors. Applications to study muscular dystrophy 
and transcription factor binding schemes are included in 
Additional file 1: S6. 

Simulation study 

We constructed a Gaussian Markov random field with 
= 100 nodes and 150 samples following the approach 
used in [23], and then randomly modified 10% of the edges 
to create two condition-specific networks with sparse 
changes. The details of simulation data generation proced- 
ure are provided in Additional file 1: S5.1. The number of 
edges in prior knowledge M was set to be the number of 
common edges in the two condition-specific networks, and 
was set to 0.1. 

To assess the effectiveness of prior knowledge incorp- 
oration and robustness of kDDN when false positive 
edges were present in prior knowledge, we examined the 
network inference precision and recall of the overall net- 
work and the differential network at different levels of 
false positive rate in the prior knowledge. 

Both false positives and false negatives in the prior 
knowledge here are with respect to the condition-specific 
ground truth from which the data are generated. Thus, al- 
though false positives in prior knowledge may contribute 
more learning errors, false negatives will not worsen net- 
work learning performance (results shown in Additional 
file 1: S5.5). 

Starting from prior knowledge without any false posi- 
tive edges, we gradually increased the false positive rate 
in prior knowledge until all prior knowledge was false. 
At each given false positive rate in the prior knowledge, 
we randomly created 1,000 sets of prior knowledge, and 
compared the performance of kDDN in terms of precision 
and recall with two baselines: (1) a purely data-driven re- 
sult, corresponding to kDDN with ^=0, i.e., without using 
any prior knowledge in the network inference (using only 
data for network inference); and (2) a naive baseline of 
knowledge incorporation by superimposing (union) the 
prior knowledge network and the network inferred purely 
from the data. 

The results of the overall network (both common and 
differential edges) learning are shown in Figure 2(a) and 
(b). The dot-connected lines are averaged precision or 
recall of the network learned with 1,000 sets of prior 
knowledge. The box plot shows the first, second and 
third quartiles of precision or recall at each false positive 
rate in prior knowledge (with the ends of the whiskers 
extending to the lowest datum within a 1.5 interquartile 
range of the lower quartile, and the highest datum within 
a 1.5 interquartile range of the upper quartile). The blue 
squared lines, brown circle lines, and red diamond lines 
indicate the mean performance of kDDN, purely data- 
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Figure 2 Effects of false positive rates in prior l<nowledge on inference of the overall network, (a) Precision of overall network inference, 
(b) Recall of overall network inference. The experiments show that true knowledge improves both precision and recall of overall network 
inference, and the maximum degradation of inference results is bounded when the prior knowledge is imperfect. 



driven baseline, and naive baseline, respectively. Narrower 
lines with the same colors and line styles mark the one 
standard deviation of the performance of the correspond- 
ing approach. 

Precision reflects the robustness to the false positive 
edges and efficiency of utilizing the information in prior 
knowledge. Figure 2(a) shows that, as expected, the false 
positive rate in prior knowledge has a limited effect on 
the precision of kDDN (blue squared lines). With more 
false positives in the prior knowledge, the precision de- 
creases but is still around the purely data-driven baseline 
(brown circle lines) and much better than the naive 
baseline (red diamond lines). The naive baseline suffers 
significantly from the false positives in prior knowledge, 
because it indiscriminately accepts all edges in prior 
knowledge without considering evidence in the data. 
This observation corroborates the design of our method: 
to control the false detection incurred by the false posi- 
tives in the prior knowledge. At the point where the false 
positive rate in the prior knowledge is 100%, the de- 
crease of precision compared with the purely data-based 
result is bounded within S, 

Recall reflects the ability of prior knowledge in helping 
recover missing edges. Figure 2(b) shows that when the 
prior knowledge is 100% false, the recall is the same as 
that of the purely data-driven result, because in this case 
the prior knowledge brings in no useful information for 
correct edge detection. When the true positive edges are 
included in the prior knowledge, the recall becomes higher 
than that of the purely data-based result, because more 
edges are correctly detected by harnessing the correct in- 
formation in the prior knowledge. The naive baseline is 
slightly higher in recall, since it calls an edge as long as 
knowledge contains it, while kDDN calls an edge only 
when both knowledge and data evidence are present. The 
closeness between kDDN and naive baseline demonstrates 



the high efficiency of our method in utilizing the true infor- 
mation in prior knowledge. 

We then evaluated the effect of knowledge incorpor- 
ation solely on the identification of differential networks 
following the same protocol. The results are shown in 
Figure 3 with the same color and line annotations. 

For differential network recovery, the naive baseline is 
almost identical to purely data-driven results because 
the prior knowledge seldom includes a differential edge 
in a large network with sparse changes. While similar 
advantages of kDDN apply, our method has better 
precision and recall, and bounds the performance deg- 
radation when knowledge is totally wrong. Unlike the 
naive baseline where knowledge and data are not linked, 
we model the inference with knowledge and data together, 
so knowledge is also able to help identify differential 
edges. Performance evaluation results in Additional file 1: 
S5.3 studied networks with varying sizes, reaching consist- 
ent conclusions. 

Depending on specific conditions, false positives in 
prior knowledge may not distribute uniformly, but tend 
to aggregate more towards certain nodes. Experiments 
with biased knowledge distribution shown in Additional 
file 1: S5.4, Figures S10-S13 indicate no difference or lit- 
tle improvement compared to random knowledge, con- 
firming that random knowledge represents the worst- 
case scenario and the bound according to random know- 
ledge is sufficient. 

Performance comparisons with peer methods 

We compared our joint learning method kDDN with four 
peer methods: 1) DDN (independent learning) [13], 2) 
csLearner (joint learning) [15], 3) Meinshausens method 
(independent learning) [23], and 4) Tesla (joint learning) 
[18]. csLearner can learn more than two networks but we 
restricted the condition to two. Meinshausens method 
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Figure 3 Effects of false positive rates in prior Icnowledge on inference of the differential network, (a) Precision of differential networl< 
inference, (b) Recall of differential network inference. The experiments show that true knowledge improves both precision and recall of 
differential network inference. The maximum degradation of inference results is bounded when the prior knowledge is imperfect. 



learns the network under a single condition, and we com- 
bined the results learned under each condition to get con- 
served network and differential network. Tesla learns a 
time-evolving network, but can be adapted to two- 
condition learning as well. Only kDDN can assign edge- 
specific /^-values to differential edges. 

Parameters in kDDN are automatically inferred from 
data as described in Additional file 1: S4.1. For the com- 
peting methods in the comparison, we manually tested 
and tuned their parameters to obtain their best perform- 
ance. We set DDN to detect pair-wise dependencies. 
The number of neighbors in csLearner is set to "4" (the 
ground truth value). Meinshausens method uses the same 
Ai as inferred by kDDN, as it is a special case of kDDN 
under one condition without prior knowledge. Tesla uses 
the empirically-determined optimal parameter values, 
since the parameter selection was not automatic but relies 
on user input. 

To assess the impact of prior knowledge, we ran kDDN 
under three scenarios: data-only (kDDN.dt), data plus true 
prior knowledge (kDDN.tk), and data plus "random" prior 
knowledge (kDDN.fk). Only kDDN is able to utilize prior 
knowledge. 

The ground truth networks consisted of 80, 100, 120, 
140 and 160 nodes, respectively, and correspondingly 120, 
150, 200, 200 and 240 samples. For each network size, 100 
replicate simulation networks were generated. We evalu- 
ated the performance of inferring both overall and differ- 
ential edges of the underlying networks using the F-score 

(harmonic mean of precision and recall, 2^^^^^^) 

and precision-recall averaged over all datasets under each 

network size. 

The results are presented in Figure 4 using bar plots. 
The color annotations are: orange-csLearner, golden- 
DDN, olive green-kDDN.dt, aquamarine-kDDN.fl<, blue- 
kDDN.tk, purple-Meinshausen, magenta-Tesla. We used 



one-sided ^-tests to assess whether kDDN performs bet- 
ter than the peer methods across all network sizes. The 
null hypothesis is that there is no difference between 
the mean of F-score of kDDN and the peer methods. 
The alternative hypothesis is that kDDN has the greater 
mean of F-score. The detailed results are included in 
Tables SI and S2 in Additional file 1: S5.7, which shows 
that kDDN.tk performs significantly better than peers in 
all cases, and kDDN.dt and kDDN.fl< performs better 
than peers in 118 of 120 cases. 

Figure 4(a) compares the ability of recovering overall 
networks. We see kDDN.tk consistently outperforms all 
peer methods, and kDDN.dt and kDDN.fl< performs 
comparably to Tesla (best-performing peer method). The 
independent learning methods, DDN and Meinshausens 
method, place third due to their inability to jointly 
use data. 

Figure 4(b) shows the comparison of performance on 
recovering differential edges. kDDN consistently outper- 
forms all peer methods under all scenarios. While kDDN 
and Tesla share some commonalities, they use different 
formulations. Where Tesla uses logistic regression, 
kDDN adopted linear regression to model the depend- 
ency. Such a difference also has implications for the sub- 
sequent solutions and outcomes. The fact that kDDN 
determines A2 according to the statistical significance 
of differential edges helps kDDN outperform Tesla in 
differential edge detection. It is also clear that a single- 
condition method cannot find the differential edges cor- 
rectly and has the worst performance. 

In Figures S17 and S18 in Additional file 1: S5.7, the 
performance of these methods is compared in terms of 
precision and recall; we reached the same conclusions. 

Through these comparisons, we show that kDDN 
performs better than peer methods in both overall and 
differential network learning. High-quality knowledge 
further improves kDDN performance, while kDDN is 
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robust enough even to totally random prior knowledge. 
Joint learning, utilization of prior knowledge, and atten- 
tion to statistical significance all helped kDDN outper- 
form the other methods. 

Pathway rewiring in yeast uncovers cell cycle response to 
oxidative stress 

To test the utility of kDDN using real biological data, we 
applied the kDDN to the public data set GSE7645. This 
data set used budding yeast Saccharomyces cerevisiae to 
study the genome-wide response to oxidative stress im- 
posed by cumene hydroperoxide (CHP). Yeast cultures 
were grown in controlled batch conditions, in IL fer- 
mentors. Three replicate cultures in mid-exponential 
phase were exposed to 0.1 9mM CHP, while three non- 
treated cultures were used as controls. Samples were 
collected at t = 0 (immediately before adding CHP) and 
at 3,6,12,20,40,70 and 120 min after adding the oxidant. 
Samples were processed for RNA extraction and pro- 
filed using Affymetrix Yeast Genome S98 arrays. There 
were 48 samples in total, evenly divided between the 
treated and the non-treated groups. 



We analyzed the network changes of cell cycle-related 
genes with structural information from the KEGG yeast 
pathway as prior knowledge. We added the well-studied 
yeast oxidative stress response gene Yapl [25-28] to the 
knowledge network and related connections gathered 
from the Saccharomyces Genome Database [29]. The 
learned differential network result is shown in Figure 5, 
in which nodes represent genes involved in the pathway 
rewiring, and edges show the condition-specific connec- 
tions. Red edges are connections in control and green 
edges are connections under stress. 

Oxidative stress is a harmful condition in cells, due to 
the failure of the antioxidant defense system to effect- 
ively remove reactive oxygen molecules and other oxi- 
dants. The result shows that Yapl^ Rhol and Msn4 are 
at the center of the network response to oxidative stress; 
they are activated under oxidative stress and many con- 
nections surrounding them are created (green edges). 
Yapl is a major transcription factor that responds to oxi- 
dative stress [25-28]. Msn4 is considered as a general re- 
sponder to environmental stresses including heat shock, 
hydrogen peroxide, hyper-osmotic shock, and amino 
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Figure 5 Differential dependency networic in budding yeast reflects the cell cycle response to oxidative stress. Red edges are 
connections in control and green edges are connections under stress, mutually and exclusively. Yopl, Rhol and Msn4, the three known 
responders to stress response, are at the center of the inferred networks in response to oxidative stress. They are activated under oxidative stress 
and many connections surrounding them are activated (green edges). Acting as an antioxidant in response to oxidative stress, Cttl coordinate 
with Yopl to protect cells from oxidative stress. 



acid starvation [30,31]- Rhol is known to resist oxidative 
damage and facilitate cell survival [32-34]. The involve- 
ment of these central genes captured the dynamic re- 
sponse of how yeast cells sense and react to oxidative 
stress. The edge between Yapl and Cttl under stress 
grants more confidence to the result. Cttl acts as an 
antioxidant in response to oxidative stress [35], and the 
coordination between Yapl and Cttl in protecting cells 
from oxidative stress is well established [36]. This result 
depicted the dynamic response of yeast when exposed to 
oxidative stress and many predictions are supported by 
previous studies. This real data study validated the effect- 
iveness of the methods in revealing underlying mecha- 
nisms and providing potentially novel insights. These 
insights would be largely missed by conventional differen- 
tial expression analysis as the important genes Rhol, 
Msn4, Yapl and Cttl ranks 13, 20, 64 and 84 among all 86 
involved genes based on ^-test j^-values. In a comparison 
with data-only results in Additional file 1: S6.1, 14 differ- 
ent differential edges are found. We also applied a boot- 
strap method in [37] to assess the robustness of the 
findings as detailed in Additional file 1: S6.2. 

Apoptosis pathway in patients with early recurrent and 
non-recurrent breast cancer 

Network rewiring analysis is also applicable to mechan- 
istic studies and helps identif)^ underlying key players 
that cause phenotypic differences. For example, 50% of 



estrogen receptor-positive breast cancers recur following 
treatment, but the mechanisms involved in cancer recur- 
rence remain unknown. Understanding of the mecha- 
nisms of breast cancer recurrence can provide critical 
information for early detection and prevention. We used 
gene expression data from a clinical study [38] to learn 
differences in the apoptosis pathway in primary tumors 
between patients with recurrent and non-recurrent dis- 
ease. We compared the pathway changes in tumors ob- 
tained from patients whose breast cancer recurred within 
5 years after treatment and from patients who remained 
recurrence-free for at least 8 years after treatment. There 
were 47 and 48 tumor samples in the recurring and non- 
recurring groups, respectively. Gene expression data were 
generated using Affymetrix U133A arrays. We used the 
apoptosis pathway from KEGG as prior knowledge. 

Following the same presentation as in the yeast study, 
red edges are connections established in patients with re- 
current disease, and green edges are connections unique 
to patients without recurrent disease. Differences in the 
signaling among genes in the apoptosis pathway between 
patients whose cancer recurred or who remained cancer- 
free are shown in Figure 6. 

Three inflammatory/immune response genes {ILIB, 
NFkB and TNFa) that are all linked to increased resist- 
ance to breast cancer treatment were identified in the 
recurrent tumors. These genes formed a path to inhibit 
proapoptotic CASP3 and PPP3R1 [39], and to activate 
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the pro-survival genes PIK3R5 or CSF2RB that maintain 
cell survival In contrast, green edges that were present 
in non-recurrent tumors form paths to both anti- 
apoptotic XIAPIAKT2 and proapoptotic BAX and BAD 
gene functions. 

When we overlaid the differential network over the 
KEGG [19] apoptosis pathway we noticed additional 
differences in the signaling patterns. Using the same 
color-coded presentation, we show the learned differen- 
tial network in Figure 7. In the recurrent breast cancers 
(red edges), the molecular activities mainly affect the ini- 
tial apoptotic signals outside the cell and within the cell 
membrane (ligands and their receptors), while inside the 
cell there is no clear signaling cascade affected to deter- 
mine cell fate. The only route affected within the cell is 
/Li^-induced inhibition of proapoptotic CASP3, In non- 
recurrent breast cancer, the affected network involves 
both signals received from activation of the membrane 
receptors and a cascade of signaling pathways inside the 
cell to promote both apoptosis and survival. Since a bal- 
ance between apoptosis and survival is necessary for dam- 
aged cells to be eliminated and repaired cells to survive 
[40], it is logical that both pathways would be activated 
concurrently. Interestingly, the imbalance of apoptotic 
and survival signals and the inhibition of CASP3 in recur- 
rent cancer both lead to the resistance of cell death, re- 
ported as a major hallmark of cancer [41]. 

In conclusion, the apoptosis pathway rewiring analysis 
identified key mechanistic signaling differences in tu- 
mors from patients whose breast cancer did or did not 
recur. These differences provide a promising ground for 
novel hypotheses to determine factors affecting breast 
cancer recurrence. 

Conclusions 

To address the challenges concerning differential network 
inference using data-knowledge integrated approaches, we 
formulated the problem of learning the condition-specific 
network structure and topological changes as a convex 
optimization problem. Model regularization and prior 
knowledge were utilized to navigate through the vast solu- 
tion space. An efficient algorithm was developed to make 
the solution scalable by exploring the special structure of 
the problem. Prior knowledge was carefully and efficiently 
incorporated in seeking the balance between the prior 
knowledge support and data-derived evidence. The pro- 
posed method can efficiently utilize prior knowledge in the 
network inference while remaining robust to false positive 
edges in the knowledge. The statistical significance of re- 
wiring and desired type I error rate were assessed and vali- 
dated. We evaluated the proposed method using synthetic 
data sets in various cases to demonstrate the effectiveness 
of this method in learning both common and differential 
networks, and the simulation results further corroborated 



our theoretical analysis. We then applied this approach to 
yeast oxidative stress data to study the cellular dynamic re- 
sponse to this environmental stress by rewiring network 
structures. Results were highly consistent with previous 
findings, providing meaningful biological insights into the 
problem. Finally, we applied the methods to breast can- 
cer recurrence data and obtained biologically plausible 
results. In the future, we plan to incorporate more types 
of biological prior information, e.g., protein-DNA bind- 
ing information in ChlP-chip data and protein-protein 
interaction data, to improve the use of condition-specific 
prior knowledge. 

Additional file 



Additional file 1: Supplementary methods and experimental 
results. Details of theoretical proofs ar^d algorithms, more syr^thetic and 
real data comparisons, and validations are included in this file. 
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